# 16kHz sampling rate

Vits Icelandic Rosa Female Monospeaker
This is an Icelandic text-to-speech model fine-tuned based on facebook/mms-tts-isl, trained using the Talrómur dataset, specializing in female voice synthesis.
Speech Synthesis Transformers Other
V
Sigurdur
22
0
Whisper Small Japanese
Apache-2.0
This model is a Japanese speech recognition model fine-tuned based on openai/whisper-small, supporting Japanese speech-to-text tasks.
Speech Recognition Transformers Japanese
W
Ivydata
356
5
Wav2vec2 Large Xlsr 53 Japanese
Apache-2.0
Japanese speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampled audio input
Speech Recognition Transformers Japanese
W
Ivydata
19
4
Exp W2v2t Fa Hubert S801
Apache-2.0
A Persian automatic speech recognition model fine-tuned from facebook/hubert-large-ll60k, trained using the Common Voice 7.0 Persian dataset.
Speech Recognition Transformers Other
E
jonatasgrosman
16
0
Exp W2v2t Sv Se Vp Nl S842
Apache-2.0
This is a Swedish automatic speech recognition model fine-tuned based on the facebook/wav2vec2-large-nl-voxpopuli model, trained using the Common Voice 7.0 (sv-SE) dataset.
Speech Recognition Transformers
E
jonatasgrosman
16
0
Exp W2v2t Sv Se Wavlm S42
Apache-2.0
A Swedish automatic speech recognition model fine-tuned from microsoft/wavlm-large, suitable for 16kHz sampled audio input.
Speech Recognition Transformers
E
jonatasgrosman
20
0
Exp W2v2t Fr Vp Fr S438
Apache-2.0
A French automatic speech recognition model fine-tuned based on the facebook/wav2vec2-large-fr-voxpopuli model, trained using the Common Voice 7.0 French dataset.
Speech Recognition Transformers French
E
jonatasgrosman
20
0
Exp W2v2t Fr Unispeech S42
Apache-2.0
A speech recognition model fine-tuned using the Common Voice 7.0 (French) dataset, based on the microsoft/unispeech-large-1500h-cv model
Speech Recognition Transformers French
E
jonatasgrosman
20
0
Exp W2v2t It No Pretraining S842
Apache-2.0
Fine-tuned from a randomly initialized wav2vec2 model for Italian speech recognition tasks, trained on the training split of Common Voice 7.0 (Italian).
Speech Recognition Transformers Other
E
jonatasgrosman
18
0
Exp W2v2t It Xlsr 53 S387
Apache-2.0
An Italian automatic speech recognition model fine-tuned based on the facebook/wav2vec2-large-xlsr-53 model, trained using the Common Voice 7.0 Italian dataset.
Speech Recognition Transformers Other
E
jonatasgrosman
18
0
Exp W2v2t It Vp 100k S449
Apache-2.0
An Italian automatic speech recognition model fine-tuned from the facebook/wav2vec2-large-100k-voxpopuli model, trained using the Common Voice 7.0 Italian dataset.
Speech Recognition Transformers Other
E
jonatasgrosman
17
0
Exp W2v2t It Wav2vec2 S609
Apache-2.0
An Italian automatic speech recognition model fine-tuned based on facebook/wav2vec2-large-lv60, trained using the Common Voice 7.0 Italian dataset.
Speech Recognition Transformers Other
E
jonatasgrosman
18
0
Exp W2v2t Th Hubert S533
Apache-2.0
A Thai speech recognition model fine-tuned from facebook/hubert-large-ll60k, trained on data from Common Voice 7.0
Speech Recognition Transformers Other
E
jonatasgrosman
19
0
Exp W2v2t Th Wav2vec2 S664
Apache-2.0
A Thai speech recognition model fine-tuned based on facebook/wav2vec2-large-lv60, trained using the Common Voice 7.0 dataset
Speech Recognition Transformers Other
E
jonatasgrosman
14
0
Exp W2v2t En Vp Nl S281
Apache-2.0
An English speech recognition model fine-tuned based on facebook/wav2vec2-large-nl-voxpopuli, trained using the Common Voice 7.0 training set.
Speech Recognition Transformers English
E
jonatasgrosman
18
0
Exp W2v2t En No Pretraining S289
Apache-2.0
This is a model designed for English speech recognition tasks, based on a randomly initialized wav2vec2 architecture and fine-tuned using the Common Voice 7.0 dataset.
Speech Recognition Transformers English
E
jonatasgrosman
18
0
Sharif Wav2vec2
MIT
A fine-tuned version of Sharif Wav2vec2 for Persian language, trained on Common Voice Persian samples, supporting automatic speech recognition tasks.
Speech Recognition Transformers Other
S
SLPL
88
16
Data2vec Audio Large 960h
Apache-2.0
Data2Vec is a general self-supervised learning framework applicable to speech, vision, and language tasks. This large audio model is pre-trained and fine-tuned on 960 hours of LibriSpeech data, specifically optimized for automatic speech recognition tasks.
Speech Recognition Transformers English
D
facebook
2,531
7
Wav2vec2 Base Da Ft Nst
Apache-2.0
Danish speech recognition model fine-tuned on the NST dataset, supporting 16kHz sampled audio input
Speech Recognition Transformers Other
W
Alvenir
15
3
Wav2vec2 Large Xlsr 53 English
Apache-2.0
An English speech recognition model fine-tuned from the facebook/wav2vec2-large-xlsr-53 model, trained on the Common Voice 6.1 dataset
Speech Recognition English
W
jonatasgrosman
251.78k
471
Wav2vec2 Large Xlsr Hindi
A Hindi automatic speech recognition model fine-tuned on low-resource Indian language datasets based on facebook/wav2vec2-large-xlsr-53
Speech Recognition Transformers Other
W
theainerd
1.6M
7
Wav2vec2 Large Xlsr 53 Slovenian
Apache-2.0
This is a Slovenian automatic speech recognition model fine-tuned from Facebook's wav2vec2-large-xlsr-53 model, trained on the Common Voice dataset with a word error rate of 36.04%.
Speech Recognition Other
W
anton-l
15.02k
0
Wav2vec2 Large Xlsr Kazakh
Apache-2.0
This is a Kazakh automatic speech recognition (ASR) model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained on the Kazakh speech corpus v1.1 with a test WER of 19.65%.
Speech Recognition Other
W
aismlv
12.08k
17
Wav2vec2 Base Vietnamese
Apache-2.0
Vietnamese speech recognition model based on Wav2Vec2 architecture, fine-tuned on VSLP dataset, supports 16kHz sampled speech input
Speech Recognition Transformers Other
W
dragonSwing
16
2
Wav2vec2 Large Xlsr 53 Finnish
Apache-2.0
A Finnish automatic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampled audio input.
Speech Recognition Other
W
Tommi
28
0
Wav2vec2 Large Xlsr 53 Eu
Apache-2.0
A Basque automatic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, achieving a 15.34% word error rate (WER) on the Common Voice Basque test set.
Speech Recognition Transformers Other
W
pcuenq
1,378
0
Wav2vec2 Large Xlsr Turkish Artificial
Apache-2.0
This is a Turkish speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained using artificial Common Voice dataset.
Speech Recognition Other
W
cahya
25
1
Wav2vec2 Large Xlsr Hindi
Apache-2.0
Hindi speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampled audio input
Speech Recognition Transformers Other
W
skylord
82
2
Wav2vec2 Xls R 1b Italian
Apache-2.0
This is an Italian automatic speech recognition model based on the XLS-R 1B architecture, fine-tuned on multiple Italian datasets
Speech Recognition Transformers Other
W
jonatasgrosman
2,703
1
Wav2vec2 Large Xlsr Javanese
Apache-2.0
A Javanese automatic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained on high-quality Javanese TTS data from OpenSLR.
Speech Recognition Other
W
cahya
659
0
Wav2vec2 Large Xlsr Sundanese
Apache-2.0
A Sundanese speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained on high-quality TTS data from OpenSLR
Speech Recognition Other
W
cahya
339
0
Xlsr En Punctuation
Apache-2.0
Fine-tuned automatic speech recognition model based on facebook/wav2vec2-large-xlsr-53 on the English Common Voice dataset, supporting punctuation prediction
Speech Recognition English
X
boris
30.28k
3
Wav2vec2 Xls R 1b Polish
Apache-2.0
This is a Polish automatic speech recognition (ASR) model fine-tuned based on the XLS-R 1-billion parameter model, trained on datasets such as Common Voice 8.0, supporting 16kHz sampling rate audio input.
Speech Recognition Transformers Other
W
jonatasgrosman
212
0
Wav2vec2 Large Xlsr 53 Turkish
Apache-2.0
A Turkish speech recognition model fine-tuned on the Common Voice dataset based on Facebook's wav2vec2-large-xlsr-53 model
Speech Recognition Other
W
aniltrkkn
68
0
Wav2vec2 Large Xlsr 53 Finnish
Apache-2.0
A Finnish automatic speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampled audio input
Speech Recognition Transformers Other
W
vasilis
27
0
Wav2vec2 Xlsr Khmer
Apache-2.0
A Khmer speech recognition model fine-tuned on the facebook/wav2vec2-large-xlsr-53 model, achieving a WER of 24.96% on the OpenSLR Khmer dataset.
Speech Recognition Other
W
gagan3012
172
1
Wav2vec2 Large Xlsr 53 Frisian
Apache-2.0
An automatic speech recognition model fine-tuned for Frisian using the Common Voice dataset, based on the facebook/wav2vec2-large-xlsr-53 model.
Speech Recognition
W
crang
22
0
Wav2vec2 Large Xlsr Indonesian
Apache-2.0
This is an automatic speech recognition model fine-tuned on the Indonesian common voice dataset based on facebook/wav2vec2-large-xlsr-53, supporting Indonesian speech recognition.
Speech Recognition Other
W
indonesian-nlp
89.58k
12
Wav2vec2 Large Xlsr 53 Spanish
Apache-2.0
This is an automatic speech recognition (ASR) model fine-tuned on the Spanish Common Voice dataset, based on the facebook/wav2vec2-large-xlsr-53 model.
Speech Recognition Spanish
W
mrm8488
38
2
Wav2vec2 Large Xlsr 53 Romanian
Apache-2.0
An automatic speech recognition model fine-tuned on the Common Voice Romanian dataset based on facebook/wav2vec2-large-xlsr-53
Speech Recognition Other
W
gmihaila
392
2
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase